Simplified Implementation of Branch Target Preloading

ABSTRACT

A system for using complex branch execution hardware and a hardware based Multiplex (MUX) to multiplex a fetch address of a future branch and a branch fetch address to one index hash value used to index a branch target prediction table for execution by a processor core, to reduce branch mis-prediction by preloading.

BACKGROUND OF THE INVENTION

The present invention relates in general to computers, and in particularto computer hardware. Still more particularly, the present inventionrelates to a system, method, and computer program for optimizingefficiency of a processor by eliminating branch mispredictions.

SUMMARY OF THE INVENTION

A system for using complex branch execution hardware and a hardwarebased Multiplex (MUX) to multiplex a fetch address of a future branchand a branch fetch address to one index hash value used to index abranch target prediction table for execution by a processor core, toreduce branch mis-prediction by preloading.

The above as well as additional objectives, features, and advantages ofthe present invention will become apparent in the following detailedwritten description.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are setforth in the appended claims. The invention itself, however, as well asa preferred mode of use, further objects and advantages thereof, willbest be understood by reference to the following detailed descriptionsof an illustrative embodiment when read in conjunction with theaccompanying drawings, wherein:

FIG. 1 is a block diagram of a data processing system in which thepresent invention may be implemented;

FIG. 2 depicts exemplary components of the overall system incorporatinga multiplex device to preload prediction hash data into a branch targetprediction table.

FIG. 3 is a high-level logical flowchart of exemplary set of stepsperformed to preload prediction hash data into a branch targetprediction table.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

With reference now to FIG. 1, there is depicted a block diagram of anexemplary computer 100 in which the present invention may beimplemented. Computer 100 includes one or more processor cores 104 thatare coupled to a system bus 106. A video adapter 108, whichdrives/supports a display 110, is also coupled to system bus 106. Systembus 106 is coupled via a bus bridge 112 to an Input/Output (I/O) bus114. An I/O interface 116 is coupled to I/O bus 114. I/O interface 116affords communication with various I/O devices, including a keyboard118, a mouse 120, a Compact Disk-Read Only Memory (CD-ROM) drive 122, afloppy disk drive 124, and a flash drive memory 126. The format of theports connected to I/0 interface 116 may be any known to those skilledin the art of computer architecture, including but not limited toUniversal Serial Bus (USB) ports.

Computer 100 is able to communicate with a software deploying server 150via a network 128 using a network interface 130, which is coupled tosystem bus 106. Network 128 may be an external network such as theInternet, or an internal network such as an Ethernet or a VirtualPrivate Network (VPN). Note the software deploying server 150 mayutilize a same or substantially similar architecture as computer 100.

A hard drive interface 132 is also coupled to system bus 106. Hard driveinterface 132 interfaces with a hard drive 134. In a preferredembodiment, hard drive 134 populates a system memory 136, which is alsocoupled to system bus 106. System memory is defined as a lowest level ofvolatile memory in computer 100. This volatile memory includesadditional higher levels of volatile memory (not shown), including, butnot limited to, cache memory, registers and buffers. Data that populatessystem memory 136 includes computer 100's operating system (OS) 138 andapplication programs 144.

OS 138 includes a shell 140, for providing transparent user access toresources such as application programs 144. Generally, shell 140 is aprogram that provides an interpreter and an interface between the userand the operating system. More specifically, shell 140 executes commandsthat are entered into a command line user interface or from a file.Thus, shell 140 (also called a command processor) is generally thehighest level of the operating system software hierarchy and serves as acommand interpreter. The shell provides a system prompt, interpretscommands entered by keyboard, mouse, or other user input media, andsends the interpreted command(s) to the appropriate lower levels of theoperating system (e.g., a kernel 142) for processing. Note that whileshell 140 is a text-based, line-oriented user interface, the presentinvention will equally well support other user interface modes, such asgraphical, voice, gestural, etc.

As depicted, OS 138 also includes kernel 142, which includes lowerlevels of functionality for OS 138, including providing essentialservices required by other parts of OS 138 and application programs 144,including memory management, process and task management, diskmanagement, and mouse and keyboard management.

Application programs 144 include a browser 146. Browser 146 includesprogram modules and instructions enabling a World Wide Web (WWW) client(i.e., computer 100) to send and receive network messages to theInternet using HyperText Transfer Protocol (HTTP) messaging, thusenabling communication with software deploying server 150.

Application programs 144 in computer 100's system memory (as well assoftware deploying server 150's system memory) also include a BranchTarget Preloading Instruction (BTPI) 148. Execution of BTPI 148 byprocessor core 104 causes use of the prefetch path in FIGS. 2-3.

Processor 200, which may or may not be a discrete chip, includeshardware necessary to preload index hash information and incorporatesthe processor core(s) 104.

The hardware elements depicted in computer 100 are not intended to beexhaustive, but rather are representative to highlight componentsrequired by the present invention. For instance, computer 100 mayinclude alternate memory storage devices such as magnetic cassettes,Digital Versatile Disks (DVDs), Bernoulli cartridges, and the like.These and other variations are intended to be within the spirit andscope of the present invention.

With reference now to FIG. 2, a high level illustration detailingexemplary components of a Processor 200, the overall systemincorporating a multiplex device to preload predicted branch targetsinto a branch target prediction table for execution, is presented.

Traditionally, some processors include complex branch execution logiccapable of executing a complex branch instruction. The complex branchinstructions include reading a Branch Computation Register and usingthat register to compute whether or not the branch should be taken.Examples of complex branch instructions are ‘decrement and branch ifzero’, which decrements the branch compute register and branches basedon the decremented value, and ‘compare and branch’, which compares thevalues of two branch compute registers and determines which branch willbe taken based on the result of the compare.

In the present invention, as Software 202 is in use, Software 202 storesthe fetch address of a future branch instruction in Branch ComputationRegister 204, which is also used by Complex Branch Execution Logic 206.Branch Computation Register 204 may be a register specific to thispurpose or a general purpose register. Software 202 also stores thepredicted target address of the aforementioned future branch instructionin Branch Target Register 208. Branch Target Register 208 may be aregister specific to this purpose or a general purpose register. Uponexecution of a BTPI within Software 204, Control Logic 210 directs MUX212 to select the input data from Branch Computation Register 204. HashLogic 214 creates an index hash by reading the output of MUX 212.Alternatively, Hash Logic 214 could be placed before MUX 212, at theoutput of Branch Computation Register 204. The index hash is used as thewrite index within Branch Target Prediction Table 216. Branch TargetRegister 208 communicates the predicted target address of the futurebranch instruction to Branch Target Prediction Table 216 through adirect connection. Control Logic 210 then causes Branch TargetPrediction Table 216 to write, which causes the predicted target addressfrom Branch Target Register 208 to be written at the location indicatedby the index hash. After this process is complete, Processor Core 104,when fetching a branch instruction, calculates a hash on the fetchaddress of the branch and reads the predicted target address stored atthe calculated hash index in the Branch Target Prediction Table 216.Logic internal to the Processor Core 104, redirects fetching to thepredicted target address if the branch is taken, to avoid a branchmis-prediction.

While Software 202 is running it calculates a target fetch address and afetch address of a future branch. However, because the Branch TargetAddress Table 216 may not contain the predicted target address for thisfuture instruction, a preload instruction is utilized. For example,Software 202 determines the fetch address of future branch to be “20,”and the predicted target address of the same future instruction to be“10.” The Software 202 then loads the predicted target address “10” intothe Branch Target Register 208, and the Fetch Address of Future Branch“20” into the Branch Computation Register 204. The software thenexecutes a BTPI. Upon execution of the BTPI, the hardware instructs MUX212 to select Branch Computation Register 204 read data and calculatesan index hash based on the selected data from the Branch ComputationRegister 204. This information is sent via a single transmission line toindex the Branch Target Prediction Table 216. The Branch TargetPrediction Table 216 is then written at the calculated index with thepredicted target address “10” in the Branch Target Register 208. At theindex based on a hash of branch fetch address “20” within the BranchTarget Prediction Table 216, the target fetch address “10” would bewritten. The Processor Core 104 then can read the predicted targetaddress from the Branch Target Prediction Table 216, and predict thecorrect branch target.

With reference now to FIG. 3, a high-level logical flowchart ofexemplary set of steps performed to preload predicted target addressesto the target branch prediction table. After initiator block 300. Thesoftware loads the fetch address of the future branch instruction in tothe branch computation register, and the predicted target address of thefuture branch instruction into the branch target register (block 302).Software then executes a preload instruction (block 304). Upon executionof a preload instruction, the hardware signals the MUX that a preloadinstruction is being executed (block 306). Following this, the MUXselects the fetch address of the future branch instruction from thebranch computation register (block 308). Next, the hash logic createsand index hash by reading the output of the MUX (block 310). The indexhash is written into the branch target prediction table at the indexcalculated by the hash. (block 312). Upon execution of a branchinstruction, logic internal to the processor core reads the predictedtarget address stored in the branch target prediction table at a hash ofthe location of the branch instruction and the target of the branchinstruction (block 314). If the branch is predicted taken, the processorthen begins fetching from the predicted target address (block 316). Theprocess ends at terminator block 318.

Although aspects of the present invention have been described withrespect to a computer processor and software, it should be understoodthat at least some aspects of the present invention may alternatively beimplemented as a program product for use with a data storage system orcomputer system. Programs defining functions of the present inventioncan be delivered to a data storage system or computer system via avariety of signal-bearing media, which include, without limitation,non-writable storage media (e.g. CD-ROM), writable storage media (e.g. afloppy diskette, hard disk drive, read/write CD-ROM, optical media), andcommunication media, such as computer and telephone networks includingEthernet. It should be understood, therefore, that such signal-bearingmedia, when carrying or encoding computer readable instructions thatdirect method functions of the present invention, represent alternativeembodiments of the present invention. Further, it is understood that thepresent invention may be implemented by a system having means in theform of hardware, software, or a combination of software and hardware asdescribed herein or their equivalent.

Having thus described the invention of the present application in detailand by reference to preferred embodiments thereof, it will be apparentthat modifications and variations are possible without departing fromthe scope of the invention defined in the appended claims.

1. A computer processor comprising: a Multiplex device (MUX) comprisedof two inputs and one output, wherein said first input is coupled to abranch fetch address or hash thereof, said second input is connected toa branch computation register read port or hash thereof, and said MUXoutput or hash thereof is coupled to a branch target prediction table; abranch target register; a logic in said computer processor, wherein saidlogic is capable of autonomously determining when a preload instructionis being executed, and autonomously causing said MUX to select saidbranch computation register read port and write data from said branchcomputation register read port or hash thereof and said branch fetchaddress or hash thereof into an index of said branch target predictiontable indicated by said MUX output; and a processor core coupled to saidbranch target prediction table, wherein said processor core is capableof using said branch fetch address of a branch to index said branchtarget prediction table and begin fetching at a predicted target addressstored in said branch target prediction table, if a predicted branch istaken.
 2. The processor of claim 1, further comprising: logic forcomputing an index hash of said MUX output data; wherein said index hashinput is coupled to said MUX output; and wherein said index hash outputis directly coupled to said branch target prediction table.
 3. Theprocessor of claim 1, further comprising: logic for computing an indexhash of said branch computation register output data; wherein said indexhash input is coupled to said branch computation register output; andwherein said index hash output is directly coupled to said MUX input. 4.The processor of claim 1, further comprising: wherein said processorfurther comprises a complex branch execution logic which uses saidbranch computation register read port in the execution of one or morecomplex branch instructions.
 5. The branch execution logic of claim 4,wherein said complex branch execution logic further comprises: a logicto execute a “decrement and branch if zero” type instruction.
 6. Thebranch execution logic of claim 4, wherein said complex branch executionlogic further comprises: a logic to execute a “compare and branch” typeinstruction.
 7. A data processing system comprising: a memory; a systembus coupled to said memory; and a computer chip coupled to said systembus, wherein said computer chip comprises: a Multiplex device (MUX)comprised of two inputs and one output, wherein said first input iscoupled to a branch fetch address or hash thereof, said second input isconnected to a branch computation register read port or hash thereof,and said MUX output or hash thereof is coupled to a branch targetprediction table; a branch target register; a logic in said computerprocessor, wherein said logic is capable of autonomously determiningwhen a preload instruction is being executed, and autonomously causingsaid MUX to select said branch computation register read port and writedata from said branch computation register read port or hash thereof andsaid branch fetch address or hash thereof into an index of said branchtarget prediction table indicated by said MUX output; and a processorcore coupled to said branch target prediction table, wherein saidprocessor core is capable of using said branch fetch address of a branchto index said branch target prediction table and begin fetching at apredicted target address stored in said branch target prediction table,if a predicted branch is taken.
 8. The processor of claim 7, furthercomprising: logic for computing an index hash of said MUX output data;wherein said index hash input is coupled to said MUX output; and whereinsaid index hash output is directly coupled to said branch targetprediction table.
 9. The processor of claim 7, further comprising: logicfor computing an index hash of said branch computation register outputdata; wherein said index hash input is coupled to said branchcomputation register output; and wherein said index hash output isdirectly coupled to said MUX input.
 10. The processor of claim 7,further comprising: wherein said processor further comprises a complexbranch execution logic which uses said branch computation register readport in the execution of one or more complex branch instructions. 11.The branch execution logic of claim 10, wherein said complex branchexecution logic further comprises: a logic to execute a “decrement andbranch if zero” type instruction.
 12. The branch execution logic ofclaim 10, wherein said complex branch execution logic further comprises:a logic to execute a “compare and branch” type instruction.